Overview
Dataset statistics
| Number of variables | 25 |
|---|---|
| Number of observations | 217 |
| Missing cells | 4 |
| Missing cells (%) | 0.1% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 159.0 KiB |
| Average record size in memory | 750.1 B |
Variable types
| Text | 1 |
|---|---|
| Numeric | 9 |
| Categorical | 13 |
| DateTime | 1 |
| Boolean | 1 |
Dataset
| Description | JHB_WRHI_003 - Quality-corrected harmonized data |
|---|---|
| Creator | RP2 Clinical Data Quality Team |
| Author | Quality-Checked Data |
| URL | HEAT Research Projects |
Variable descriptions
| Age (at enrolment) | Patient age at study enrollment |
|---|---|
| CD4 cell count (cells/µL) | CD4+ T lymphocyte count (missing codes removed) |
| HIV viral load (copies/mL) | HIV RNA copies per mL (missing codes removed) |
| BMI (kg/m²) | Body Mass Index (extreme values removed) |
| Waist circumference (cm) | Waist circumference (corrected from mm to cm) |
| ALT (U/L) | Alanine aminotransferase (missing codes removed) |
| Platelet count (×10³/µL) | Platelet count (missing codes removed) |
| Hematocrit (%) | Hematocrit (zero values removed) |
| Lymphocyte count (×10⁹/L) | Lymphocyte absolute count (corrected labeling) |
| Neutrophil count (×10⁹/L) | Neutrophil absolute count (corrected labeling) |
| cd4_correction_applied | Quality flag: CD4 missing codes removed |
| final_comprehensive_fix_applied | Quality flag: Comprehensive corrections applied |
| waist_circ_unit_correction_applied | Quality flag: Waist circ unit corrected |
study_source has constant value "JHB_WRHI_003" | Constant |
province has constant value "Gauteng" | Constant |
city has constant value "Johannesburg" | Constant |
HIV_status has constant value "Positive" | Constant |
Antiretroviral Therapy Status has constant value "Positive" | Constant |
cd4_correction_applied has constant value "0.0" | Constant |
final_comprehensive_fix_applied has constant value "1.0" | Constant |
waist_circ_unit_correction_applied has constant value "False" | Constant |
ALT (U/L) is highly overall correlated with AST (U/L) | High correlation |
AST (U/L) is highly overall correlated with ALT (U/L) | High correlation |
CD4 cell count (cells/µL) is highly overall correlated with White blood cell count (×10³/µL) | High correlation |
HIV viral load (copies/mL) is highly overall correlated with Patient ID | High correlation |
Patient ID is highly overall correlated with HIV viral load (copies/mL) and 5 other fields | High correlation |
Race is highly overall correlated with Patient ID | High correlation |
Sex is highly overall correlated with Patient ID and 1 other fields | High correlation |
White blood cell count (×10³/µL) is highly overall correlated with CD4 cell count (cells/µL) | High correlation |
hemoglobin_g_dL is highly overall correlated with Sex | High correlation |
jhb_subregion is highly overall correlated with Patient ID and 2 other fields | High correlation |
latitude is highly overall correlated with Patient ID and 2 other fields | High correlation |
longitude is highly overall correlated with Patient ID and 2 other fields | High correlation |
Race is highly imbalanced (95.8%) | Imbalance |
HIV viral load (copies/mL) is highly imbalanced (61.9%) | Imbalance |
CD4 cell count (cells/µL) has 4 (1.8%) missing values | Missing |
anonymous_patient_id has unique values | Unique |
Patient ID has unique values | Unique |
Reproduction
| Analysis started | 2025-11-24 21:50:06.015012 |
|---|---|
| Analysis finished | 2025-11-24 21:50:12.695530 |
| Duration | 6.68 seconds |
| Software version | ydata-profiling vv4.18.0 |
| Download configuration | config.json |
Variables
anonymous_patient_id
Text
Unique
| Distinct | 217 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 15.7 KiB |
Length
| Max length | 17 |
|---|---|
| Median length | 17 |
| Mean length | 17 |
| Min length | 17 |
Unique
| Unique | 217 ? |
|---|---|
| Unique (%) | 100.0% |
Sample
| 1st row | HEAT_329E55DDD278 |
|---|---|
| 2nd row | HEAT_C8A77DD97D98 |
| 3rd row | HEAT_A4407F8E079E |
| 4th row | HEAT_7DCC7C7F1641 |
| 5th row | HEAT_32253618AEF8 |
| Value | Count | Frequency (%) |
| heat_329e55ddd278 | 1 | 0.5% |
| heat_b378f883c50b | 1 | 0.5% |
| heat_133c575ec479 | 1 | 0.5% |
| heat_a4407f8e079e | 1 | 0.5% |
| heat_7dcc7c7f1641 | 1 | 0.5% |
| heat_32253618aef8 | 1 | 0.5% |
| heat_5c22fd95bf09 | 1 | 0.5% |
| heat_a5dc9507fdda | 1 | 0.5% |
| heat_605625b419c8 | 1 | 0.5% |
| heat_931a30042daf | 1 | 0.5% |
| Other values (207) | 207 |
Most occurring characters
| Value | Count | Frequency (%) |
| A | 377 | 10.2% |
| E | 357 | 9.7% |
| H | 217 | 5.9% |
| T | 217 | 5.9% |
| _ | 217 | 5.9% |
| 8 | 191 | 5.2% |
| 6 | 179 | 4.9% |
| 4 | 177 | 4.8% |
| F | 174 | 4.7% |
| 5 | 174 | 4.7% |
| Other values (9) | 1409 |
Most occurring categories
| Value | Count | Frequency (%) |
| Uppercase Letter | 1817 | |
| Decimal Number | 1655 | |
| Connector Punctuation | 217 | 5.9% |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 8 | 191 | |
| 6 | 179 | |
| 4 | 177 | |
| 5 | 174 | |
| 2 | 165 | |
| 0 | 163 | |
| 7 | 153 | |
| 1 | 152 | |
| 3 | 152 | |
| 9 | 149 |
Uppercase Letter
| Value | Count | Frequency (%) |
| A | 377 | |
| E | 357 | |
| H | 217 | |
| T | 217 | |
| F | 174 | |
| B | 165 | |
| D | 157 | |
| C | 153 |
Connector Punctuation
| Value | Count | Frequency (%) |
| _ | 217 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 1872 | |
| Latin | 1817 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| _ | 217 | |
| 8 | 191 | |
| 6 | 179 | |
| 4 | 177 | |
| 5 | 174 | |
| 2 | 165 | |
| 0 | 163 | |
| 7 | 153 | |
| 1 | 152 | |
| 3 | 152 |
Latin
| Value | Count | Frequency (%) |
| A | 377 | |
| E | 357 | |
| H | 217 | |
| T | 217 | |
| F | 174 | |
| B | 165 | |
| D | 157 | |
| C | 153 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 3689 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| A | 377 | 10.2% |
| E | 357 | 9.7% |
| H | 217 | 5.9% |
| T | 217 | 5.9% |
| _ | 217 | 5.9% |
| 8 | 191 | 5.2% |
| 6 | 179 | 4.9% |
| 4 | 177 | 4.8% |
| F | 174 | 4.7% |
| 5 | 174 | 4.7% |
| Other values (9) | 1409 |
Patient ID
Real number (ℝ)
High correlation Unique
| Distinct | 217 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 173.52995 |
| Minimum | 1 |
|---|---|
| Maximum | 351 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 3.4 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 14.6 |
| Q1 | 87 |
| median | 175 |
| Q3 | 260 |
| 95-th percentile | 333.4 |
| Maximum | 351 |
| Range | 350 |
| Interquartile range (IQR) | 173 |
Descriptive statistics
| Standard deviation | 101.83674 |
|---|---|
| Coefficient of variation (CV) | 0.58685398 |
| Kurtosis | -1.1196307 |
| Mean | 173.52995 |
| Median Absolute Deviation (MAD) | 86 |
| Skewness | 0.0021213009 |
| Sum | 37656 |
| Variance | 10370.722 |
| Monotonicity | Strictly increasing |
| Value | Count | Frequency (%) |
| 1 | 1 | 0.5% |
| 234 | 1 | 0.5% |
| 215 | 1 | 0.5% |
| 217 | 1 | 0.5% |
| 220 | 1 | 0.5% |
| 222 | 1 | 0.5% |
| 223 | 1 | 0.5% |
| 224 | 1 | 0.5% |
| 227 | 1 | 0.5% |
| 228 | 1 | 0.5% |
| Other values (207) | 207 |
| Value | Count | Frequency (%) |
| 1 | 1 | |
| 3 | 1 | |
| 4 | 1 | |
| 5 | 1 | |
| 6 | 1 | |
| 7 | 1 | |
| 8 | 1 | |
| 9 | 1 | |
| 10 | 1 | |
| 12 | 1 |
| Value | Count | Frequency (%) |
| 351 | 1 | |
| 350 | 1 | |
| 349 | 1 | |
| 348 | 1 | |
| 347 | 1 | |
| 346 | 1 | |
| 345 | 1 | |
| 342 | 1 | |
| 338 | 1 | |
| 337 | 1 |
study_source
Categorical
Constant
| Distinct | 1 |
|---|---|
| Distinct (%) | 0.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 14.6 KiB |
| JHB_WRHI_003 |
|---|
Length
| Max length | 12 |
|---|---|
| Median length | 12 |
| Mean length | 12 |
| Min length | 12 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | JHB_WRHI_003 |
|---|---|
| 2nd row | JHB_WRHI_003 |
| 3rd row | JHB_WRHI_003 |
| 4th row | JHB_WRHI_003 |
| 5th row | JHB_WRHI_003 |
Common Values
| Value | Count | Frequency (%) |
| JHB_WRHI_003 | 217 |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| jhb_wrhi_003 | 217 |
Most occurring characters
| Value | Count | Frequency (%) |
| H | 434 | |
| _ | 434 | |
| 0 | 434 | |
| J | 217 | |
| B | 217 | |
| W | 217 | |
| R | 217 | |
| I | 217 | |
| 3 | 217 |
Most occurring categories
| Value | Count | Frequency (%) |
| Uppercase Letter | 1519 | |
| Decimal Number | 651 | |
| Connector Punctuation | 434 | 16.7% |
Most frequent character per category
Uppercase Letter
| Value | Count | Frequency (%) |
| H | 434 | |
| J | 217 | |
| B | 217 | |
| W | 217 | |
| R | 217 | |
| I | 217 |
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 434 | |
| 3 | 217 |
Connector Punctuation
| Value | Count | Frequency (%) |
| _ | 434 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 1519 | |
| Common | 1085 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| H | 434 | |
| J | 217 | |
| B | 217 | |
| W | 217 | |
| R | 217 | |
| I | 217 |
Common
| Value | Count | Frequency (%) |
| _ | 434 | |
| 0 | 434 | |
| 3 | 217 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 2604 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| H | 434 | |
| _ | 434 | |
| 0 | 434 | |
| J | 217 | |
| B | 217 | |
| W | 217 | |
| R | 217 | |
| I | 217 | |
| 3 | 217 |
primary_date
Date
| Distinct | 112 |
|---|---|
| Distinct (%) | 51.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 3.4 KiB |
| Minimum | 2016-07-19 00:00:00 |
|---|---|
| Maximum | 2017-06-15 00:00:00 |
| Invalid dates | 0 |
| Invalid dates (%) | 0.0% |
Age (at enrolment)
Real number (ℝ)
Patient age at study enrollment
| Distinct | 39 |
|---|---|
| Distinct (%) | 18.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 41.663594 |
| Minimum | 20 |
|---|---|
| Maximum | 67 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 3.4 KiB |
Quantile statistics
| Minimum | 20 |
|---|---|
| 5-th percentile | 30 |
| Q1 | 36 |
| median | 40 |
| Q3 | 47 |
| 95-th percentile | 56.2 |
| Maximum | 67 |
| Range | 47 |
| Interquartile range (IQR) | 11 |
Descriptive statistics
| Standard deviation | 8.0984802 |
|---|---|
| Coefficient of variation (CV) | 0.19437786 |
| Kurtosis | -0.0090547417 |
| Mean | 41.663594 |
| Median Absolute Deviation (MAD) | 6 |
| Skewness | 0.40369989 |
| Sum | 9041 |
| Variance | 65.585381 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 40 | 16 | 7.4% |
| 39 | 15 | 6.9% |
| 46 | 13 | 6.0% |
| 34 | 13 | 6.0% |
| 37 | 13 | 6.0% |
| 35 | 11 | 5.1% |
| 42 | 11 | 5.1% |
| 38 | 10 | 4.6% |
| 44 | 9 | 4.1% |
| 49 | 7 | 3.2% |
| Other values (29) | 99 |
| Value | Count | Frequency (%) |
| 20 | 1 | 0.5% |
| 25 | 1 | 0.5% |
| 26 | 2 | 0.9% |
| 27 | 1 | 0.5% |
| 28 | 2 | 0.9% |
| 29 | 1 | 0.5% |
| 30 | 7 | |
| 31 | 6 | |
| 32 | 1 | 0.5% |
| 33 | 5 |
| Value | Count | Frequency (%) |
| 67 | 1 | 0.5% |
| 63 | 1 | 0.5% |
| 62 | 1 | 0.5% |
| 61 | 1 | 0.5% |
| 58 | 2 | 0.9% |
| 57 | 5 | |
| 56 | 1 | 0.5% |
| 55 | 4 | |
| 54 | 7 | |
| 53 | 4 |
Sex
Categorical
High correlation
| Distinct | 2 |
|---|---|
| Distinct (%) | 0.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 13.2 KiB |
| Female | |
|---|---|
| Male |
Length
| Max length | 6 |
|---|---|
| Median length | 6 |
| Mean length | 5.4101382 |
| Min length | 4 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | Female |
|---|---|
| 2nd row | Female |
| 3rd row | Female |
| 4th row | Female |
| 5th row | Male |
Common Values
| Value | Count | Frequency (%) |
| Female | 153 | |
| Male | 64 |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| female | 153 | |
| male | 64 |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 370 | |
| a | 217 | |
| l | 217 | |
| F | 153 | |
| m | 153 | |
| M | 64 | 5.5% |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 957 | |
| Uppercase Letter | 217 | 18.5% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| e | 370 | |
| a | 217 | |
| l | 217 | |
| m | 153 |
Uppercase Letter
| Value | Count | Frequency (%) |
| F | 153 | |
| M | 64 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 1174 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| e | 370 | |
| a | 217 | |
| l | 217 | |
| F | 153 | |
| m | 153 | |
| M | 64 | 5.5% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 1174 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| e | 370 | |
| a | 217 | |
| l | 217 | |
| F | 153 | |
| m | 153 | |
| M | 64 | 5.5% |
Race
Categorical
High correlation Imbalance
| Distinct | 2 |
|---|---|
| Distinct (%) | 0.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 13.1 KiB |
| Black | |
|---|---|
| Mixed Race | 1 |
Length
| Max length | 10 |
|---|---|
| Median length | 5 |
| Mean length | 5.0230415 |
| Min length | 5 |
Unique
| Unique | 1 ? |
|---|---|
| Unique (%) | 0.5% |
Sample
| 1st row | Black |
|---|---|
| 2nd row | Black |
| 3rd row | Black |
| 4th row | Black |
| 5th row | Black |
Common Values
| Value | Count | Frequency (%) |
| Black | 216 | |
| Mixed Race | 1 | 0.5% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| black | 216 | |
| mixed | 1 | 0.5% |
| race | 1 | 0.5% |
Most occurring characters
| Value | Count | Frequency (%) |
| a | 217 | |
| c | 217 | |
| B | 216 | |
| l | 216 | |
| k | 216 | |
| e | 2 | 0.2% |
| M | 1 | 0.1% |
| i | 1 | 0.1% |
| x | 1 | 0.1% |
| d | 1 | 0.1% |
| Other values (2) | 2 | 0.2% |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 871 | |
| Uppercase Letter | 218 | 20.0% |
| Space Separator | 1 | 0.1% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| a | 217 | |
| c | 217 | |
| l | 216 | |
| k | 216 | |
| e | 2 | 0.2% |
| i | 1 | 0.1% |
| x | 1 | 0.1% |
| d | 1 | 0.1% |
Uppercase Letter
| Value | Count | Frequency (%) |
| B | 216 | |
| M | 1 | 0.5% |
| R | 1 | 0.5% |
Space Separator
| Value | Count | Frequency (%) |
| 1 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 1089 | |
| Common | 1 | 0.1% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| a | 217 | |
| c | 217 | |
| B | 216 | |
| l | 216 | |
| k | 216 | |
| e | 2 | 0.2% |
| M | 1 | 0.1% |
| i | 1 | 0.1% |
| x | 1 | 0.1% |
| d | 1 | 0.1% |
Common
| Value | Count | Frequency (%) |
| 1 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 1090 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| a | 217 | |
| c | 217 | |
| B | 216 | |
| l | 216 | |
| k | 216 | |
| e | 2 | 0.2% |
| M | 1 | 0.1% |
| i | 1 | 0.1% |
| x | 1 | 0.1% |
| d | 1 | 0.1% |
| Other values (2) | 2 | 0.2% |
latitude
Categorical
High correlation
| Distinct | 2 |
|---|---|
| Distinct (%) | 0.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 13.8 KiB |
| -26.2041 | |
|---|---|
| -26.2309 |
Length
| Max length | 8 |
|---|---|
| Median length | 8 |
| Mean length | 8 |
| Min length | 8 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | -26.2041 |
|---|---|
| 2nd row | -26.2041 |
| 3rd row | -26.2041 |
| 4th row | -26.2041 |
| 5th row | -26.2309 |
Common Values
| Value | Count | Frequency (%) |
| -26.2041 | 190 | |
| -26.2309 | 27 | 12.4% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 26.2041 | 190 | |
| 26.2309 | 27 | 12.4% |
Most occurring characters
| Value | Count | Frequency (%) |
| 2 | 434 | |
| - | 217 | |
| 6 | 217 | |
| . | 217 | |
| 0 | 217 | |
| 4 | 190 | |
| 1 | 190 | |
| 3 | 27 | 1.6% |
| 9 | 27 | 1.6% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 1302 | |
| Dash Punctuation | 217 | 12.5% |
| Other Punctuation | 217 | 12.5% |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 2 | 434 | |
| 6 | 217 | |
| 0 | 217 | |
| 4 | 190 | |
| 1 | 190 | |
| 3 | 27 | 2.1% |
| 9 | 27 | 2.1% |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 217 |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 217 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 1736 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 2 | 434 | |
| - | 217 | |
| 6 | 217 | |
| . | 217 | |
| 0 | 217 | |
| 4 | 190 | |
| 1 | 190 | |
| 3 | 27 | 1.6% |
| 9 | 27 | 1.6% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 1736 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 2 | 434 | |
| - | 217 | |
| 6 | 217 | |
| . | 217 | |
| 0 | 217 | |
| 4 | 190 | |
| 1 | 190 | |
| 3 | 27 | 1.6% |
| 9 | 27 | 1.6% |
longitude
Categorical
High correlation
| Distinct | 3 |
|---|---|
| Distinct (%) | 1.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 13.6 KiB |
| 28.0473 | |
|---|---|
| 27.8585 | |
| 27.9394 |
Length
| Max length | 7 |
|---|---|
| Median length | 7 |
| Mean length | 7 |
| Min length | 7 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 28.0473 |
|---|---|
| 2nd row | 28.0473 |
| 3rd row | 28.0473 |
| 4th row | 27.9394 |
| 5th row | 27.8585 |
Common Values
| Value | Count | Frequency (%) |
| 28.0473 | 172 | |
| 27.8585 | 27 | 12.4% |
| 27.9394 | 18 | 8.3% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 28.0473 | 172 | |
| 27.8585 | 27 | 12.4% |
| 27.9394 | 18 | 8.3% |
Most occurring characters
| Value | Count | Frequency (%) |
| 8 | 226 | |
| 2 | 217 | |
| . | 217 | |
| 7 | 217 | |
| 4 | 190 | |
| 3 | 190 | |
| 0 | 172 | |
| 5 | 54 | 3.6% |
| 9 | 36 | 2.4% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 1302 | |
| Other Punctuation | 217 | 14.3% |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 8 | 226 | |
| 2 | 217 | |
| 7 | 217 | |
| 4 | 190 | |
| 3 | 190 | |
| 0 | 172 | |
| 5 | 54 | 4.1% |
| 9 | 36 | 2.8% |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 217 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 1519 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 8 | 226 | |
| 2 | 217 | |
| . | 217 | |
| 7 | 217 | |
| 4 | 190 | |
| 3 | 190 | |
| 0 | 172 | |
| 5 | 54 | 3.6% |
| 9 | 36 | 2.4% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 1519 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 8 | 226 | |
| 2 | 217 | |
| . | 217 | |
| 7 | 217 | |
| 4 | 190 | |
| 3 | 190 | |
| 0 | 172 | |
| 5 | 54 | 3.6% |
| 9 | 36 | 2.4% |
province
Categorical
Constant
| Distinct | 1 |
|---|---|
| Distinct (%) | 0.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 13.6 KiB |
| Gauteng |
|---|
Length
| Max length | 7 |
|---|---|
| Median length | 7 |
| Mean length | 7 |
| Min length | 7 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | Gauteng |
|---|---|
| 2nd row | Gauteng |
| 3rd row | Gauteng |
| 4th row | Gauteng |
| 5th row | Gauteng |
Common Values
| Value | Count | Frequency (%) |
| Gauteng | 217 |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| gauteng | 217 |
Most occurring characters
| Value | Count | Frequency (%) |
| G | 217 | |
| a | 217 | |
| u | 217 | |
| t | 217 | |
| e | 217 | |
| n | 217 | |
| g | 217 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 1302 | |
| Uppercase Letter | 217 | 14.3% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| a | 217 | |
| u | 217 | |
| t | 217 | |
| e | 217 | |
| n | 217 | |
| g | 217 |
Uppercase Letter
| Value | Count | Frequency (%) |
| G | 217 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 1519 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| G | 217 | |
| a | 217 | |
| u | 217 | |
| t | 217 | |
| e | 217 | |
| n | 217 | |
| g | 217 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 1519 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| G | 217 | |
| a | 217 | |
| u | 217 | |
| t | 217 | |
| e | 217 | |
| n | 217 | |
| g | 217 |
city
Categorical
Constant
| Distinct | 1 |
|---|---|
| Distinct (%) | 0.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 14.6 KiB |
| Johannesburg |
|---|
Length
| Max length | 12 |
|---|---|
| Median length | 12 |
| Mean length | 12 |
| Min length | 12 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | Johannesburg |
|---|---|
| 2nd row | Johannesburg |
| 3rd row | Johannesburg |
| 4th row | Johannesburg |
| 5th row | Johannesburg |
Common Values
| Value | Count | Frequency (%) |
| Johannesburg | 217 |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| johannesburg | 217 |
Most occurring characters
| Value | Count | Frequency (%) |
| n | 434 | |
| J | 217 | |
| o | 217 | |
| h | 217 | |
| a | 217 | |
| e | 217 | |
| s | 217 | |
| b | 217 | |
| u | 217 | |
| r | 217 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 2387 | |
| Uppercase Letter | 217 | 8.3% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| n | 434 | |
| o | 217 | |
| h | 217 | |
| a | 217 | |
| e | 217 | |
| s | 217 | |
| b | 217 | |
| u | 217 | |
| r | 217 | |
| g | 217 |
Uppercase Letter
| Value | Count | Frequency (%) |
| J | 217 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 2604 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| n | 434 | |
| J | 217 | |
| o | 217 | |
| h | 217 | |
| a | 217 | |
| e | 217 | |
| s | 217 | |
| b | 217 | |
| u | 217 | |
| r | 217 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 2604 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| n | 434 | |
| J | 217 | |
| o | 217 | |
| h | 217 | |
| a | 217 | |
| e | 217 | |
| s | 217 | |
| b | 217 | |
| u | 217 | |
| r | 217 |
jhb_subregion
Categorical
High correlation
| Distinct | 2 |
|---|---|
| Distinct (%) | 0.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 14.4 KiB |
| Central_JHB | |
|---|---|
| Western_JHB |
Length
| Max length | 11 |
|---|---|
| Median length | 11 |
| Mean length | 11 |
| Min length | 11 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | Central_JHB |
|---|---|
| 2nd row | Central_JHB |
| 3rd row | Central_JHB |
| 4th row | Western_JHB |
| 5th row | Western_JHB |
Common Values
| Value | Count | Frequency (%) |
| Central_JHB | 172 | |
| Western_JHB | 45 | 20.7% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| central_jhb | 172 | |
| western_jhb | 45 | 20.7% |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 262 | |
| n | 217 | |
| t | 217 | |
| r | 217 | |
| _ | 217 | |
| J | 217 | |
| H | 217 | |
| B | 217 | |
| C | 172 | |
| a | 172 | |
| Other values (3) | 262 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 1302 | |
| Uppercase Letter | 868 | |
| Connector Punctuation | 217 | 9.1% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| e | 262 | |
| n | 217 | |
| t | 217 | |
| r | 217 | |
| a | 172 | |
| l | 172 | |
| s | 45 | 3.5% |
Uppercase Letter
| Value | Count | Frequency (%) |
| J | 217 | |
| H | 217 | |
| B | 217 | |
| C | 172 | |
| W | 45 | 5.2% |
Connector Punctuation
| Value | Count | Frequency (%) |
| _ | 217 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 2170 | |
| Common | 217 | 9.1% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| e | 262 | |
| n | 217 | |
| t | 217 | |
| r | 217 | |
| J | 217 | |
| H | 217 | |
| B | 217 | |
| C | 172 | |
| a | 172 | |
| l | 172 | |
| Other values (2) | 90 | 4.1% |
Common
| Value | Count | Frequency (%) |
| _ | 217 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 2387 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| e | 262 | |
| n | 217 | |
| t | 217 | |
| r | 217 | |
| _ | 217 | |
| J | 217 | |
| H | 217 | |
| B | 217 | |
| C | 172 | |
| a | 172 | |
| Other values (3) | 262 |
CD4 cell count (cells/µL)
Real number (ℝ)
High correlation Missing
CD4+ T lymphocyte count (missing codes removed)
| Distinct | 194 |
|---|---|
| Distinct (%) | 91.1% |
| Missing | 4 |
| Missing (%) | 1.8% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 669.23944 |
| Minimum | 90 |
|---|---|
| Maximum | 1596 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 3.4 KiB |
Quantile statistics
| Minimum | 90 |
|---|---|
| 5-th percentile | 210.4 |
| Q1 | 496 |
| median | 637 |
| Q3 | 885 |
| 95-th percentile | 1136.8 |
| Maximum | 1596 |
| Range | 1506 |
| Interquartile range (IQR) | 389 |
Descriptive statistics
| Standard deviation | 278.34576 |
|---|---|
| Coefficient of variation (CV) | 0.41591357 |
| Kurtosis | 0.1796013 |
| Mean | 669.23944 |
| Median Absolute Deviation (MAD) | 184 |
| Skewness | 0.39364919 |
| Sum | 142548 |
| Variance | 77476.362 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 894 | 3 | 1.4% |
| 596 | 3 | 1.4% |
| 947 | 3 | 1.4% |
| 620 | 2 | 0.9% |
| 404 | 2 | 0.9% |
| 567 | 2 | 0.9% |
| 847 | 2 | 0.9% |
| 626 | 2 | 0.9% |
| 594 | 2 | 0.9% |
| 651 | 2 | 0.9% |
| Other values (184) | 190 | |
| (Missing) | 4 | 1.8% |
| Value | Count | Frequency (%) |
| 90 | 1 | |
| 121 | 1 | |
| 128 | 1 | |
| 138 | 1 | |
| 154 | 1 | |
| 160 | 1 | |
| 165 | 1 | |
| 177 | 1 | |
| 178 | 1 | |
| 190 | 1 |
| Value | Count | Frequency (%) |
| 1596 | 1 | |
| 1501 | 1 | |
| 1371 | 1 | |
| 1339 | 1 | |
| 1254 | 1 | |
| 1239 | 1 | |
| 1230 | 1 | |
| 1187 | 1 | |
| 1184 | 1 | |
| 1178 | 1 |
HIV viral load (copies/mL)
Categorical
High correlation Imbalance
HIV RNA copies per mL (missing codes removed)
| Distinct | 4 |
|---|---|
| Distinct (%) | 1.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 12.8 KiB |
| 0.0 | |
|---|---|
| 40.0 | |
| 41.0 | 1 |
| 63.0 | 1 |
Length
| Max length | 4 |
|---|---|
| Median length | 3 |
| Mean length | 3.1889401 |
| Min length | 3 |
Unique
| Unique | 2 ? |
|---|---|
| Unique (%) | 0.9% |
Sample
| 1st row | 0.0 |
|---|---|
| 2nd row | 0.0 |
| 3rd row | 40.0 |
| 4th row | 0.0 |
| 5th row | 40.0 |
Common Values
| Value | Count | Frequency (%) |
| 0.0 | 176 | |
| 40.0 | 39 | 18.0% |
| 41.0 | 1 | 0.5% |
| 63.0 | 1 | 0.5% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 0.0 | 176 | |
| 40.0 | 39 | 18.0% |
| 41.0 | 1 | 0.5% |
| 63.0 | 1 | 0.5% |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 432 | |
| . | 217 | |
| 4 | 40 | 5.8% |
| 1 | 1 | 0.1% |
| 6 | 1 | 0.1% |
| 3 | 1 | 0.1% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 475 | |
| Other Punctuation | 217 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 432 | |
| 4 | 40 | 8.4% |
| 1 | 1 | 0.2% |
| 6 | 1 | 0.2% |
| 3 | 1 | 0.2% |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 217 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 692 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 432 | |
| . | 217 | |
| 4 | 40 | 5.8% |
| 1 | 1 | 0.1% |
| 6 | 1 | 0.1% |
| 3 | 1 | 0.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 692 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 432 | |
| . | 217 | |
| 4 | 40 | 5.8% |
| 1 | 1 | 0.1% |
| 6 | 1 | 0.1% |
| 3 | 1 | 0.1% |
HIV_status
Categorical
Constant
| Distinct | 1 |
|---|---|
| Distinct (%) | 0.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 13.8 KiB |
| Positive |
|---|
Length
| Max length | 8 |
|---|---|
| Median length | 8 |
| Mean length | 8 |
| Min length | 8 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | Positive |
|---|---|
| 2nd row | Positive |
| 3rd row | Positive |
| 4th row | Positive |
| 5th row | Positive |
Common Values
| Value | Count | Frequency (%) |
| Positive | 217 |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| positive | 217 |
Most occurring characters
| Value | Count | Frequency (%) |
| i | 434 | |
| P | 217 | |
| o | 217 | |
| s | 217 | |
| t | 217 | |
| v | 217 | |
| e | 217 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 1519 | |
| Uppercase Letter | 217 | 12.5% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| i | 434 | |
| o | 217 | |
| s | 217 | |
| t | 217 | |
| v | 217 | |
| e | 217 |
Uppercase Letter
| Value | Count | Frequency (%) |
| P | 217 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 1736 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| i | 434 | |
| P | 217 | |
| o | 217 | |
| s | 217 | |
| t | 217 | |
| v | 217 | |
| e | 217 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 1736 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| i | 434 | |
| P | 217 | |
| o | 217 | |
| s | 217 | |
| t | 217 | |
| v | 217 | |
| e | 217 |
Antiretroviral Therapy Status
Categorical
Constant
| Distinct | 1 |
|---|---|
| Distinct (%) | 0.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 13.8 KiB |
| Positive |
|---|
Length
| Max length | 8 |
|---|---|
| Median length | 8 |
| Mean length | 8 |
| Min length | 8 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | Positive |
|---|---|
| 2nd row | Positive |
| 3rd row | Positive |
| 4th row | Positive |
| 5th row | Positive |
Common Values
| Value | Count | Frequency (%) |
| Positive | 217 |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| positive | 217 |
Most occurring characters
| Value | Count | Frequency (%) |
| i | 434 | |
| P | 217 | |
| o | 217 | |
| s | 217 | |
| t | 217 | |
| v | 217 | |
| e | 217 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 1519 | |
| Uppercase Letter | 217 | 12.5% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| i | 434 | |
| o | 217 | |
| s | 217 | |
| t | 217 | |
| v | 217 | |
| e | 217 |
Uppercase Letter
| Value | Count | Frequency (%) |
| P | 217 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 1736 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| i | 434 | |
| P | 217 | |
| o | 217 | |
| s | 217 | |
| t | 217 | |
| v | 217 | |
| e | 217 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 1736 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| i | 434 | |
| P | 217 | |
| o | 217 | |
| s | 217 | |
| t | 217 | |
| v | 217 | |
| e | 217 |
White blood cell count (×10³/µL)
Real number (ℝ)
High correlation
| Distinct | 170 |
|---|---|
| Distinct (%) | 78.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 5.4964055 |
| Minimum | 2.25 |
|---|---|
| Maximum | 15.85 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 3.4 KiB |
Quantile statistics
| Minimum | 2.25 |
|---|---|
| 5-th percentile | 3.358 |
| Q1 | 4.36 |
| median | 5.21 |
| Q3 | 6.5 |
| 95-th percentile | 8.102 |
| Maximum | 15.85 |
| Range | 13.6 |
| Interquartile range (IQR) | 2.14 |
Descriptive statistics
| Standard deviation | 1.7174402 |
|---|---|
| Coefficient of variation (CV) | 0.31246607 |
| Kurtosis | 8.8783306 |
| Mean | 5.4964055 |
| Median Absolute Deviation (MAD) | 0.99 |
| Skewness | 1.8992561 |
| Sum | 1192.72 |
| Variance | 2.9496009 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 4.36 | 3 | 1.4% |
| 4.45 | 3 | 1.4% |
| 5.74 | 3 | 1.4% |
| 5.98 | 3 | 1.4% |
| 4.22 | 3 | 1.4% |
| 5.21 | 2 | 0.9% |
| 4.47 | 2 | 0.9% |
| 4.15 | 2 | 0.9% |
| 4.2 | 2 | 0.9% |
| 7.48 | 2 | 0.9% |
| Other values (160) | 192 |
| Value | Count | Frequency (%) |
| 2.25 | 1 | |
| 2.28 | 1 | |
| 2.4 | 1 | |
| 2.48 | 1 | |
| 2.5 | 1 | |
| 2.97 | 1 | |
| 3.15 | 1 | |
| 3.17 | 1 | |
| 3.21 | 1 | |
| 3.25 | 1 |
| Value | Count | Frequency (%) |
| 15.85 | 1 | |
| 14.98 | 1 | |
| 8.97 | 1 | |
| 8.91 | 1 | |
| 8.64 | 1 | |
| 8.56 | 1 | |
| 8.47 | 2 | |
| 8.3 | 1 | |
| 8.21 | 1 | |
| 8.15 | 1 |
Platelet count (×10³/µL)
Real number (ℝ)
Platelet count (missing codes removed)
| Distinct | 145 |
|---|---|
| Distinct (%) | 66.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 264.53456 |
| Minimum | 110 |
|---|---|
| Maximum | 588 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 3.4 KiB |
Quantile statistics
| Minimum | 110 |
|---|---|
| 5-th percentile | 171.8 |
| Q1 | 219 |
| median | 251 |
| Q3 | 306 |
| 95-th percentile | 385.4 |
| Maximum | 588 |
| Range | 478 |
| Interquartile range (IQR) | 87 |
Descriptive statistics
| Standard deviation | 71.369474 |
|---|---|
| Coefficient of variation (CV) | 0.26979262 |
| Kurtosis | 2.7657312 |
| Mean | 264.53456 |
| Median Absolute Deviation (MAD) | 44 |
| Skewness | 1.1693336 |
| Sum | 57404 |
| Variance | 5093.6018 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 207 | 4 | 1.8% |
| 234 | 4 | 1.8% |
| 230 | 4 | 1.8% |
| 264 | 4 | 1.8% |
| 245 | 4 | 1.8% |
| 164 | 3 | 1.4% |
| 205 | 3 | 1.4% |
| 237 | 3 | 1.4% |
| 235 | 3 | 1.4% |
| 219 | 3 | 1.4% |
| Other values (135) | 182 |
| Value | Count | Frequency (%) |
| 110 | 1 | 0.5% |
| 134 | 1 | 0.5% |
| 141 | 1 | 0.5% |
| 146 | 1 | 0.5% |
| 148 | 1 | 0.5% |
| 163 | 1 | 0.5% |
| 164 | 3 | |
| 165 | 1 | 0.5% |
| 171 | 1 | 0.5% |
| 172 | 1 | 0.5% |
| Value | Count | Frequency (%) |
| 588 | 1 | |
| 527 | 2 | |
| 477 | 1 | |
| 460 | 1 | |
| 447 | 1 | |
| 422 | 1 | |
| 403 | 1 | |
| 399 | 1 | |
| 390 | 1 | |
| 387 | 1 |
hemoglobin_g_dL
Real number (ℝ)
High correlation
| Distinct | 68 |
|---|---|
| Distinct (%) | 31.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 13.53871 |
| Minimum | 7.6 |
|---|---|
| Maximum | 17.7 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 3.4 KiB |
Quantile statistics
| Minimum | 7.6 |
|---|---|
| 5-th percentile | 10.5 |
| Q1 | 12.5 |
| median | 13.5 |
| Q3 | 14.7 |
| 95-th percentile | 16.3 |
| Maximum | 17.7 |
| Range | 10.1 |
| Interquartile range (IQR) | 2.2 |
Descriptive statistics
| Standard deviation | 1.753429 |
|---|---|
| Coefficient of variation (CV) | 0.12951227 |
| Kurtosis | 0.35834819 |
| Mean | 13.53871 |
| Median Absolute Deviation (MAD) | 1.1 |
| Skewness | -0.28913593 |
| Sum | 2937.9 |
| Variance | 3.0745131 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 14 | 10 | 4.6% |
| 14.1 | 9 | 4.1% |
| 12.5 | 7 | 3.2% |
| 12.9 | 7 | 3.2% |
| 13 | 7 | 3.2% |
| 13.3 | 7 | 3.2% |
| 13.8 | 6 | 2.8% |
| 15.5 | 6 | 2.8% |
| 14.6 | 6 | 2.8% |
| 13.2 | 6 | 2.8% |
| Other values (58) | 146 |
| Value | Count | Frequency (%) |
| 7.6 | 1 | 0.5% |
| 8.6 | 1 | 0.5% |
| 9 | 1 | 0.5% |
| 9.3 | 1 | 0.5% |
| 9.7 | 1 | 0.5% |
| 9.9 | 1 | 0.5% |
| 10.1 | 3 | |
| 10.4 | 1 | 0.5% |
| 10.5 | 3 | |
| 10.6 | 1 | 0.5% |
| Value | Count | Frequency (%) |
| 17.7 | 1 | 0.5% |
| 17.6 | 1 | 0.5% |
| 17.4 | 1 | 0.5% |
| 17.3 | 1 | 0.5% |
| 16.9 | 3 | |
| 16.5 | 1 | 0.5% |
| 16.4 | 1 | 0.5% |
| 16.3 | 5 | |
| 16 | 1 | 0.5% |
| 15.9 | 3 |
| Distinct | 42 |
|---|---|
| Distinct (%) | 19.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 20.926267 |
| Minimum | 6 |
|---|---|
| Maximum | 98 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 3.4 KiB |
Quantile statistics
| Minimum | 6 |
|---|---|
| 5-th percentile | 10 |
| Q1 | 14 |
| median | 17 |
| Q3 | 24 |
| 95-th percentile | 41 |
| Maximum | 98 |
| Range | 92 |
| Interquartile range (IQR) | 10 |
Descriptive statistics
| Standard deviation | 13.499798 |
|---|---|
| Coefficient of variation (CV) | 0.64511255 |
| Kurtosis | 14.618813 |
| Mean | 20.926267 |
| Median Absolute Deviation (MAD) | 5 |
| Skewness | 3.3227683 |
| Sum | 4541 |
| Variance | 182.24454 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 15 | 17 | 7.8% |
| 14 | 16 | 7.4% |
| 17 | 15 | 6.9% |
| 12 | 14 | 6.5% |
| 16 | 13 | 6.0% |
| 10 | 11 | 5.1% |
| 21 | 11 | 5.1% |
| 13 | 10 | 4.6% |
| 18 | 9 | 4.1% |
| 20 | 8 | 3.7% |
| Other values (32) | 93 |
| Value | Count | Frequency (%) |
| 6 | 2 | 0.9% |
| 7 | 1 | 0.5% |
| 8 | 2 | 0.9% |
| 9 | 4 | 1.8% |
| 10 | 11 | |
| 11 | 8 | |
| 12 | 14 | |
| 13 | 10 | |
| 14 | 16 | |
| 15 | 17 |
| Value | Count | Frequency (%) |
| 98 | 2 | |
| 97 | 1 | |
| 71 | 1 | |
| 70 | 1 | |
| 64 | 1 | |
| 50 | 1 | |
| 46 | 2 | |
| 43 | 1 | |
| 41 | 2 | |
| 40 | 2 |
AST (U/L)
Real number (ℝ)
High correlation
| Distinct | 34 |
|---|---|
| Distinct (%) | 15.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 22.705069 |
| Minimum | 10 |
|---|---|
| Maximum | 97 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 3.4 KiB |
Quantile statistics
| Minimum | 10 |
|---|---|
| 5-th percentile | 14.8 |
| Q1 | 18 |
| median | 21 |
| Q3 | 25 |
| 95-th percentile | 33.2 |
| Maximum | 97 |
| Range | 87 |
| Interquartile range (IQR) | 7 |
Descriptive statistics
| Standard deviation | 8.3258645 |
|---|---|
| Coefficient of variation (CV) | 0.36669629 |
| Kurtosis | 30.009058 |
| Mean | 22.705069 |
| Median Absolute Deviation (MAD) | 4 |
| Skewness | 4.0488585 |
| Sum | 4927 |
| Variance | 69.32002 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 22 | 19 | 8.8% |
| 20 | 18 | 8.3% |
| 21 | 18 | 8.3% |
| 17 | 17 | 7.8% |
| 23 | 14 | 6.5% |
| 19 | 13 | 6.0% |
| 15 | 13 | 6.0% |
| 18 | 13 | 6.0% |
| 25 | 11 | 5.1% |
| 24 | 11 | 5.1% |
| Other values (24) | 70 |
| Value | Count | Frequency (%) |
| 10 | 1 | 0.5% |
| 12 | 3 | 1.4% |
| 13 | 1 | 0.5% |
| 14 | 6 | 2.8% |
| 15 | 13 | |
| 16 | 8 | |
| 17 | 17 | |
| 18 | 13 | |
| 19 | 13 | |
| 20 | 18 |
| Value | Count | Frequency (%) |
| 97 | 1 | |
| 55 | 1 | |
| 50 | 1 | |
| 48 | 1 | |
| 43 | 1 | |
| 41 | 1 | |
| 40 | 1 | |
| 39 | 1 | |
| 38 | 1 | |
| 35 | 1 |
total_cholesterol_mg_dL
Real number (ℝ)
| Distinct | 155 |
|---|---|
| Distinct (%) | 71.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 4.9320276 |
| Minimum | 2.82 |
|---|---|
| Maximum | 8.18 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 3.4 KiB |
Quantile statistics
| Minimum | 2.82 |
|---|---|
| 5-th percentile | 3.43 |
| Q1 | 4.28 |
| median | 4.74 |
| Q3 | 5.53 |
| 95-th percentile | 6.698 |
| Maximum | 8.18 |
| Range | 5.36 |
| Interquartile range (IQR) | 1.25 |
Descriptive statistics
| Standard deviation | 0.962825 |
|---|---|
| Coefficient of variation (CV) | 0.1952189 |
| Kurtosis | -0.033836679 |
| Mean | 4.9320276 |
| Median Absolute Deviation (MAD) | 0.61 |
| Skewness | 0.41501086 |
| Sum | 1070.25 |
| Variance | 0.92703198 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 5.58 | 4 | 1.8% |
| 3.74 | 3 | 1.4% |
| 4.79 | 3 | 1.4% |
| 4.07 | 3 | 1.4% |
| 4.96 | 3 | 1.4% |
| 6.04 | 3 | 1.4% |
| 4.38 | 3 | 1.4% |
| 4.18 | 3 | 1.4% |
| 3.69 | 3 | 1.4% |
| 5.32 | 3 | 1.4% |
| Other values (145) | 186 |
| Value | Count | Frequency (%) |
| 2.82 | 1 | 0.5% |
| 2.85 | 1 | 0.5% |
| 2.89 | 1 | 0.5% |
| 3.08 | 1 | 0.5% |
| 3.19 | 1 | 0.5% |
| 3.34 | 1 | 0.5% |
| 3.39 | 2 | |
| 3.4 | 1 | 0.5% |
| 3.43 | 3 | |
| 3.57 | 1 | 0.5% |
| Value | Count | Frequency (%) |
| 8.18 | 1 | |
| 7.34 | 1 | |
| 7.09 | 2 | |
| 6.97 | 1 | |
| 6.82 | 1 | |
| 6.81 | 1 | |
| 6.79 | 1 | |
| 6.78 | 1 | |
| 6.73 | 2 | |
| 6.69 | 1 |
| Distinct | 1 |
|---|---|
| Distinct (%) | 0.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 12.7 KiB |
| 0.0 |
|---|
Length
| Max length | 3 |
|---|---|
| Median length | 3 |
| Mean length | 3 |
| Min length | 3 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0.0 |
|---|---|
| 2nd row | 0.0 |
| 3rd row | 0.0 |
| 4th row | 0.0 |
| 5th row | 0.0 |
Common Values
| Value | Count | Frequency (%) |
| 0.0 | 217 |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 0.0 | 217 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 434 | |
| . | 217 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 434 | |
| Other Punctuation | 217 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 434 |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 217 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 651 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 434 | |
| . | 217 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 651 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 434 | |
| . | 217 |
| Distinct | 1 |
|---|---|
| Distinct (%) | 0.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 12.7 KiB |
| 1.0 |
|---|
Length
| Max length | 3 |
|---|---|
| Median length | 3 |
| Mean length | 3 |
| Min length | 3 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 1.0 |
|---|---|
| 2nd row | 1.0 |
| 3rd row | 1.0 |
| 4th row | 1.0 |
| 5th row | 1.0 |
Common Values
| Value | Count | Frequency (%) |
| 1.0 | 217 |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 1.0 | 217 |
Most occurring characters
| Value | Count | Frequency (%) |
| 1 | 217 | |
| . | 217 | |
| 0 | 217 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 434 | |
| Other Punctuation | 217 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 1 | 217 | |
| 0 | 217 |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 217 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 651 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 1 | 217 | |
| . | 217 | |
| 0 | 217 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 651 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 1 | 217 | |
| . | 217 | |
| 0 | 217 |
| Distinct | 1 |
|---|---|
| Distinct (%) | 0.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 1.9 KiB |
| False |
|---|
| Value | Count | Frequency (%) |
| False | 217 |
Interactions
Correlations
| ALT (U/L) | AST (U/L) | Age (at enrolment) | CD4 cell count (cells/µL) | HIV viral load (copies/mL) | Patient ID | Platelet count (×10³/µL) | Race | Sex | White blood cell count (×10³/µL) | hemoglobin_g_dL | jhb_subregion | latitude | longitude | total_cholesterol_mg_dL | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ALT (U/L) | 1.000 | 0.737 | -0.026 | 0.122 | 0.000 | -0.078 | -0.047 | 0.000 | 0.000 | 0.025 | 0.316 | 0.000 | 0.000 | 0.093 | -0.027 |
| AST (U/L) | 0.737 | 1.000 | -0.011 | 0.086 | 0.000 | -0.097 | -0.060 | 0.000 | 0.236 | -0.104 | 0.169 | 0.027 | 0.159 | 0.068 | -0.065 |
| Age (at enrolment) | -0.026 | -0.011 | 1.000 | 0.025 | 0.161 | 0.019 | -0.055 | 0.000 | 0.142 | -0.047 | 0.051 | 0.000 | 0.000 | 0.000 | 0.198 |
| CD4 cell count (cells/µL) | 0.122 | 0.086 | 0.025 | 1.000 | 0.000 | -0.096 | 0.251 | 0.000 | 0.227 | 0.617 | -0.040 | 0.000 | 0.000 | 0.000 | -0.014 |
| HIV viral load (copies/mL) | 0.000 | 0.000 | 0.161 | 0.000 | 1.000 | 1.000 | 0.293 | 0.000 | 0.000 | 0.055 | 0.117 | 0.000 | 0.000 | 0.000 | 0.000 |
| Patient ID | -0.078 | -0.097 | 0.019 | -0.096 | 1.000 | 1.000 | -0.088 | 1.000 | 1.000 | 0.061 | -0.030 | 1.000 | 1.000 | 1.000 | -0.001 |
| Platelet count (×10³/µL) | -0.047 | -0.060 | -0.055 | 0.251 | 0.293 | -0.088 | 1.000 | 0.000 | 0.346 | 0.369 | -0.340 | 0.000 | 0.000 | 0.000 | 0.119 |
| Race | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 1.000 | 0.000 | 1.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
| Sex | 0.000 | 0.236 | 0.142 | 0.227 | 0.000 | 1.000 | 0.346 | 0.000 | 1.000 | 0.261 | 0.595 | 0.000 | 0.000 | 0.000 | 0.012 |
| White blood cell count (×10³/µL) | 0.025 | -0.104 | -0.047 | 0.617 | 0.055 | 0.061 | 0.369 | 0.000 | 0.261 | 1.000 | 0.018 | 0.000 | 0.000 | 0.000 | 0.018 |
| hemoglobin_g_dL | 0.316 | 0.169 | 0.051 | -0.040 | 0.117 | -0.030 | -0.340 | 0.000 | 0.595 | 0.018 | 1.000 | 0.048 | 0.074 | 0.000 | 0.150 |
| jhb_subregion | 0.000 | 0.027 | 0.000 | 0.000 | 0.000 | 1.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.048 | 1.000 | 0.718 | 0.998 | 0.205 |
| latitude | 0.000 | 0.159 | 0.000 | 0.000 | 0.000 | 1.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.074 | 0.718 | 1.000 | 0.998 | 0.133 |
| longitude | 0.093 | 0.068 | 0.000 | 0.000 | 0.000 | 1.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.998 | 0.998 | 1.000 | 0.123 |
| total_cholesterol_mg_dL | -0.027 | -0.065 | 0.198 | -0.014 | 0.000 | -0.001 | 0.119 | 0.000 | 0.012 | 0.018 | 0.150 | 0.205 | 0.133 | 0.123 | 1.000 |
Missing values
Sample
| anonymous_patient_id | Patient ID | study_source | primary_date | Age (at enrolment) | Sex | Race | latitude | longitude | province | city | jhb_subregion | CD4 cell count (cells/µL) | HIV viral load (copies/mL) | HIV_status | Antiretroviral Therapy Status | White blood cell count (×10³/µL) | Platelet count (×10³/µL) | hemoglobin_g_dL | ALT (U/L) | AST (U/L) | total_cholesterol_mg_dL | cd4_correction_applied | final_comprehensive_fix_applied | waist_circ_unit_correction_applied | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | HEAT_329E55DDD278 | 1 | JHB_WRHI_003 | 2016-07-19 | 30.0 | Female | Black | -26.2041 | 28.0473 | Gauteng | Johannesburg | Central_JHB | 1020.0 | 0.0 | Positive | Positive | 5.21 | 390.0 | 10.9 | 16.0 | 25.0 | 6.79 | 0.0 | 1.0 | False |
| 1 | HEAT_C8A77DD97D98 | 3 | JHB_WRHI_003 | 2016-07-19 | 53.0 | Female | Black | -26.2041 | 28.0473 | Gauteng | Johannesburg | Central_JHB | 446.0 | 0.0 | Positive | Positive | 3.68 | 234.0 | 13.5 | 8.0 | 15.0 | 4.93 | 0.0 | 1.0 | False |
| 2 | HEAT_A4407F8E079E | 4 | JHB_WRHI_003 | 2016-07-19 | 36.0 | Female | Black | -26.2041 | 28.0473 | Gauteng | Johannesburg | Central_JHB | 1054.0 | 40.0 | Positive | Positive | 7.71 | 344.0 | 13.3 | 17.0 | 17.0 | 5.19 | 0.0 | 1.0 | False |
| 3 | HEAT_7DCC7C7F1641 | 5 | JHB_WRHI_003 | 2016-07-19 | 47.0 | Female | Black | -26.2041 | 27.9394 | Gauteng | Johannesburg | Western_JHB | 989.0 | 0.0 | Positive | Positive | 6.35 | 257.0 | 10.8 | 11.0 | 12.0 | 6.69 | 0.0 | 1.0 | False |
| 4 | HEAT_32253618AEF8 | 6 | JHB_WRHI_003 | 2016-07-19 | 34.0 | Male | Black | -26.2309 | 27.8585 | Gauteng | Johannesburg | Western_JHB | 160.0 | 40.0 | Positive | Positive | 4.17 | 343.0 | 11.2 | 41.0 | 32.0 | 3.19 | 0.0 | 1.0 | False |
| 5 | HEAT_5C22FD95BF09 | 7 | JHB_WRHI_003 | 2016-07-21 | 40.0 | Female | Black | -26.2041 | 28.0473 | Gauteng | Johannesburg | Central_JHB | 989.0 | 40.0 | Positive | Positive | 7.09 | 319.0 | 15.1 | 14.0 | 17.0 | 5.58 | 0.0 | 1.0 | False |
| 6 | HEAT_A5DC9507FDDA | 8 | JHB_WRHI_003 | 2016-07-19 | 35.0 | Male | Black | -26.2041 | 28.0473 | Gauteng | Johannesburg | Central_JHB | 453.0 | 0.0 | Positive | Positive | 4.66 | 229.0 | 17.4 | 30.0 | 24.0 | 5.52 | 0.0 | 1.0 | False |
| 7 | HEAT_605625B419C8 | 9 | JHB_WRHI_003 | 2016-07-21 | 30.0 | Male | Black | -26.2041 | 28.0473 | Gauteng | Johannesburg | Central_JHB | 288.0 | 0.0 | Positive | Positive | 4.12 | 240.0 | 17.7 | 64.0 | 43.0 | 4.56 | 0.0 | 1.0 | False |
| 8 | HEAT_931A30042DAF | 10 | JHB_WRHI_003 | 2016-07-22 | 44.0 | Female | Black | -26.2309 | 27.8585 | Gauteng | Johannesburg | Western_JHB | 907.0 | 0.0 | Positive | Positive | 4.77 | 230.0 | 12.3 | 25.0 | 31.0 | 4.12 | 0.0 | 1.0 | False |
| 9 | HEAT_3E8A9CB371EE | 12 | JHB_WRHI_003 | 2016-07-25 | 36.0 | Male | Black | -26.2041 | 28.0473 | Gauteng | Johannesburg | Central_JHB | 509.0 | 40.0 | Positive | Positive | 4.72 | 186.0 | 15.5 | 28.0 | 26.0 | 4.26 | 0.0 | 1.0 | False |
| anonymous_patient_id | Patient ID | study_source | primary_date | Age (at enrolment) | Sex | Race | latitude | longitude | province | city | jhb_subregion | CD4 cell count (cells/µL) | HIV viral load (copies/mL) | HIV_status | Antiretroviral Therapy Status | White blood cell count (×10³/µL) | Platelet count (×10³/µL) | hemoglobin_g_dL | ALT (U/L) | AST (U/L) | total_cholesterol_mg_dL | cd4_correction_applied | final_comprehensive_fix_applied | waist_circ_unit_correction_applied | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 207 | HEAT_1DEA5323412B | 337 | JHB_WRHI_003 | 2017-05-10 | 46.0 | Male | Black | -26.2041 | 28.0473 | Gauteng | Johannesburg | Central_JHB | 190.0 | 40.0 | Positive | Positive | 4.36 | 225.0 | 14.3 | 40.0 | 30.0 | 4.62 | 0.0 | 1.0 | False |
| 208 | HEAT_7F7338A00A97 | 338 | JHB_WRHI_003 | 2017-05-22 | 33.0 | Male | Black | -26.2041 | 28.0473 | Gauteng | Johannesburg | Central_JHB | 596.0 | 0.0 | Positive | Positive | 4.45 | 237.0 | 15.5 | 97.0 | 50.0 | 6.04 | 0.0 | 1.0 | False |
| 209 | HEAT_9BE1B22421BC | 342 | JHB_WRHI_003 | 2017-05-18 | 43.0 | Female | Black | -26.2041 | 28.0473 | Gauteng | Johannesburg | Central_JHB | 541.0 | 40.0 | Positive | Positive | 4.58 | 403.0 | 12.8 | 27.0 | 20.0 | 6.24 | 0.0 | 1.0 | False |
| 210 | HEAT_BE76BD19CBFC | 345 | JHB_WRHI_003 | 2017-05-17 | 42.0 | Male | Black | -26.2041 | 28.0473 | Gauteng | Johannesburg | Central_JHB | 530.0 | 0.0 | Positive | Positive | 6.31 | 245.0 | 17.6 | 39.0 | 22.0 | 6.73 | 0.0 | 1.0 | False |
| 211 | HEAT_8EC86E61B084 | 346 | JHB_WRHI_003 | 2017-05-16 | 39.0 | Female | Black | -26.2041 | 28.0473 | Gauteng | Johannesburg | Central_JHB | 390.0 | 0.0 | Positive | Positive | 4.29 | 254.0 | 11.6 | 14.0 | 14.0 | 4.46 | 0.0 | 1.0 | False |
| 212 | HEAT_43FDEDBF5846 | 347 | JHB_WRHI_003 | 2017-05-22 | 39.0 | Male | Black | -26.2041 | 28.0473 | Gauteng | Johannesburg | Central_JHB | 415.0 | 0.0 | Positive | Positive | 4.42 | 192.0 | 14.0 | 18.0 | 25.0 | 4.71 | 0.0 | 1.0 | False |
| 213 | HEAT_751D406DFAF2 | 348 | JHB_WRHI_003 | 2017-06-06 | 57.0 | Female | Black | -26.2041 | 28.0473 | Gauteng | Johannesburg | Central_JHB | 786.0 | 63.0 | Positive | Positive | 8.15 | 447.0 | 11.3 | 16.0 | 17.0 | 6.03 | 0.0 | 1.0 | False |
| 214 | HEAT_5A4AD8FB0DD2 | 349 | JHB_WRHI_003 | 2017-05-25 | 61.0 | Male | Black | -26.2041 | 28.0473 | Gauteng | Johannesburg | Central_JHB | 672.0 | 0.0 | Positive | Positive | 6.04 | 320.0 | 16.3 | 13.0 | 21.0 | 4.33 | 0.0 | 1.0 | False |
| 215 | HEAT_345F925E036F | 350 | JHB_WRHI_003 | 2017-05-26 | 37.0 | Female | Black | -26.2041 | 28.0473 | Gauteng | Johannesburg | Central_JHB | 520.0 | 0.0 | Positive | Positive | 4.22 | 248.0 | 12.0 | 16.0 | 22.0 | 5.84 | 0.0 | 1.0 | False |
| 216 | HEAT_797EB3CC686B | 351 | JHB_WRHI_003 | 2017-06-15 | 39.0 | Female | Black | -26.2041 | 28.0473 | Gauteng | Johannesburg | Central_JHB | 888.0 | 40.0 | Positive | Positive | 4.22 | 210.0 | 12.9 | 46.0 | 32.0 | 6.25 | 0.0 | 1.0 | False |